Protein sequence-structure compatibility criteria in terms of statistical hypothesis testing.

نویسندگان

  • S Sunyaev
  • E Kuznetsov
  • I Rodchenkov
  • V Tumanyan
چکیده

The assignment of query protein sequences to probable folds in a threading approach is based on the statistical analysis (learning) of structural properties of amino acids in known protein structures. We formalize the recognition problem in terms of mathematical statistics, namely statistical hypothesis testing. Our general formulation leads to various mathematical forms of a decision rule function for evaluation of the quality of a sequence-structure fit. Three criteria were derived according to a likelihood ratio approach. Two of them have new functional forms while the third happens to coincide with the mean force potential function previously derived under the additional assumption of the Boltzmann law. New decision rule functions employ (i) the Parzen estimator of a probability density and (ii) the newly introduced non-parametric statistic with known asymptotic distribution. We compared criteria efficiency by a 'structure seeks sequence' search for three highly populated template folds through a query library of non-homologous sequences of proteins with known 3D structure using residue accessibility as an environmental variable. Various criteria reflect different underlying statistical propositions and thus often recognize diverse correct sequence-structure matches. On the other hand, if an amino acid sequence is recognized as compatible with a template by each of three decision rules it appears that one can make a more reliable inference of sequence-structure relationship since almost all false positives obtained by the three criteria differ.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

In Silico Analysis of Primary Sequence and Tertiary Structure of Lepidium Draba Peroxidase

Peroxidase enzymes are vastly applicable in industry and diagnosiss. Recently, we introduced a new kind of peroxidase gene from Lepidium draba (LDP). According to protein multiple sequence alignment results, LDP had 93% similarity and 88.96% identity with horseradish peroxidase C1A (HRP C1A). In the current study we employed in silico tools to determine, to which group of peroxidase enzymes LDP...

متن کامل

Acceptance sampling for attributes via hypothesis testing and the hypergeometric distribution

This paper questions some aspects of attribute acceptance sampling in light of the original concepts of hypothesis testing from Neyman and Pearson (NP). Attribute acceptance sampling in industry, as developed by Dodge and Romig (DR), generally follows the international standards of ISO 2859, and similarly the Brazilian standards NBR 5425 to NBR 5427 and the United States Standards ANSI/ASQC Z1....

متن کامل

Pareto-based Multi-criteria Evolutionary Algorithm for Parallel Machines Scheduling Problem with Sequence-dependent Setup Times

This paper addresses an unrelated multi-machine scheduling problem with sequence-dependent setup time, release date and processing set restriction to minimize the sum of weighted earliness/tardiness penalties and the sum of completion times, which is known to be NP-hard. A Mixed Integer Programming (MIP) model is proposed to formulate the considered multi-criteria problem. Also, to solve the mo...

متن کامل

تبیین رابطه سازگاری کاربری های جانبی مساجد نسبت به کاربری عبادی، با محل استقرار آنها نسبت به فضاهای اصلی عبادت

The first step to the architectural design of each building, is physical extraction program. There are various functions in mosques capability with a larger scale. Selecting these functins and location of deployment of them are the most challenging parts of the mosque's design. Functions choice in the mosques of Islamic Iranian cities should be based on the priority that is consistent with Shar...

متن کامل

TESTING STATISTICAL HYPOTHESES UNDER FUZZY DATA AND BASED ON A NEW SIGNED DISTANCE

This paper deals with the problem of testing statisticalhypotheses when the available data are fuzzy. In this approach, wefirst obtain a fuzzy test statistic based on fuzzy data, and then,based on a new signed distance between fuzzy numbers, we introducea new decision rule to accept/reject the hypothesis of interest.The proposed approach is investigated for two cases: the casewithout nuisance p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Protein engineering

دوره 10 6  شماره 

صفحات  -

تاریخ انتشار 1997